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Our  work  has  focused  on  developing  new  cost  sensitive  feature  acquisition  and  classification 
algorithms,  mapping  these  algorithms  onto  camera  networks,  and  creating  a  test  bed  of  video  data 
and  implemented  vision  algorithms  that  we  can  use  to  implement  these.  First,  we  will  describe  a 
new  algorithm  that  we  have  developed  for  feature  acquisition  in  Hidden  Markov  Models  (HMMs). 
This  is  particularly  useful  for  inference  tasks  involving  video  from  a  single  camera,  in  which  the 
relationship  between  frames  of  video  can  be  modeled  as  a  Markov  chain.  We  will  describe  this 
algorithm  in  the  context  of  using  background  subtraction  results  to  identify  portions  of  video  that 
contain  a  moving  object.  Next,  we  will  describe  new  algorithms  that  apply  to  general  graphical 
models.  These  can  be  tested  using  existing  test  sets  that  are  drawn  from  a  range  of  domains  in 
addition  to  sensor  networks. 


1  Feature  Acquisition  within  a  Single  Camera 

We  have  completed  a  preliminary  study  aimed  at  performing  cost-sensitive  label  acquisition  for 
background  subtraction  in  a  single  video  stream.  We  describe  a  small  set  of  initial  results  that  we 
find  encouraging. 

We  consider  the  problem  of  using  background  subtraction  to  determine  whether  there  is  a 
moving  object  in  a  video  (see  Figure  1).  Objects  are  detected  by  first  performing  background 
subtraction,  and  then  using  the  size  of  the  largest  connected  component  of  foreground  pixels  as  a 
feature.  Generally,  this  is  small  when  noise  causes  scattered  foreground  pixels,  and  larger  when 
there  is  a  moving  object.  We  can  integrate  information  temporally  using  a  Hidden  Markov  Model 
with  two  states  that  reflect  the  presence  or  absence  of  a  moving  object.  Each  state  gives  rise  to  a 
different  distribution  for  the  size  of  the  largest  connected  component. 

In  this  simple  setting  we  can  examine  the  problem  of  performing  label  acquisition  to  control 
the  use  of  cheap  and  expensive  background  subtraction  algorithms.  For  a  cheap  algorithm,  we 
simply  threshold  the  difference  between  each  frame  and  the  previous  one,  marking  pixels  with  large 
intensity  differences  as  foreground.  As  a  more  expensive  algorithm  we  use  a  mixture  of  Gaussians 
to  model  the  background,  and  mark  foreground  pixels  that  are  unlikely  to  be  background  according 
to  this  model  [7].  The  question  we  face  is  whether  we  can  run  the  cheap  algorithm  on  all  frames 
but  use  the  expensive  algorithm  sparingly,  and  still  achieve  accuracy  similar  to  that  obtained  when 
we  run  the  both  algorithms  everywhere. 

If  we  idealize  the  situation  slightly,  and  assume  that  the  expensive  algorithm  always  produces 
an  accurate  result,  then  [3]  show  that  we  can  use  dynamic  programming  (DP)  to  find  the  optimal  lo¬ 
cations  at  which  to  apply  the  expensive  algorithm.  Unfortunately,  given  n  nodes  in  an  HMM,  their 
algorithm  requires  0(rr)  computation.  This  algorithm  is  suitable  for  an  HMM  that  has  few  nodes, 
when  obtaining  new  measurements  is  very  expensive,  but  not  for  a  video  containing  thousands 
of  frames.  However,  we  have  experimented  with  snippets  of  video  containing  dozens  of  frames. 
For  these  we  can  use  the  DP  algorithm  to  select  frames  for  expensive  background  subtraction, 
producing  results  that  are  twice  as  accurate  as  when  we  simply  apply  expensive  processing  to  the 
same  number  of  frames  selected  uniformly  in  the  sequence.  This  is  encouraging,  but  significantly 
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Figure  1 !  Our  background  subtraction  experiments  use  video  from  outdoor  scenes,  as  shown  above. 


understates  the  potential  of  intelligent  label  acquisition. 

If  we  divide  a  video  into  short  snippets,  the  primary  potential  for  intelligent  processing  is 
probably  in  deciding  which  snippets  deserve  a  lot  of  extra  attention  and  which  do  not.  We  have 
therefore  devised  a  new  algorithm  that  applies  expensive  processing  to  every  50th  frame  and  then 
runs  DP  on  each  snippet.  We  have  developed  a  novel  method  to  combine  these  results  efficiently 
and  optimally  to  tell  us  in  which  snippets  we  should  apply  expensive  processing,  and  where  in 
these  snippets  to  process.  This  gives  us  an  algorithm  that,  as  overhead,  requires  application  of 
expensive  processing  to  a  constant  fraction  of  the  frames,  but  in  return  reduces  computation  time 
from  0{n5)  to  0(n),  yielding  an  algorithm  suitable  for  long  video  streams. 

In  section  1.1,  we  describe  how  we  map  a  Hidden  Markov  Model  to  the  problem  of  identi¬ 
fying  interesting  events  in  a  single  video  stream.  In  section  1.2,  we  formulate  the  problem  more 
concretely.  In  section  1.3,  we  describe  a  dynamic  programming  algorithm  to  solve  the  problem 
optimally.  In  section  1.4,  we  discuss  how  we  modify  the  algorithm  in  section  1.3  to  meet  our  need. 
In  section  1.5,  we  describe  and  discuss  the  simple  experiment  we  have  performed  so  far. 
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Figure  2:  The  HMM  model 


1.1  Hidden  Markov  Model  and  Probabilistic  Inference 

To  start,  we  notice  that  whether  or  not  a  frame  of  video  is  interesting  is  closely  related  to  whether 
its  neighboring  frames  are  interesting.  This  suggests  that  we  map  the  problem  of  identifying  sig¬ 
nificant  sequences  of  video  onto  a  Hidden  Markov  Model  (HMM),  which  will  enable  us  to  do 
probabilistic  inference.  In  our  model,  each  frame  in  the  sequence  is  associated  with  a  state  vari¬ 
able.  The  state  is  either  interesting  or  not  interesting.  The  observation  emitted  by  the  state  variable 
corresponds  to  the  features  extracted  from  the  frame.  For  simplicity,  one  cheap  feature  and  one 
expensive  feature  are  considered  for  now.  The  structure  of  this  model  is  shown  in  figure  2.  Vari¬ 
ables  starting  with  label  “S”  are  state  variables,  and  variables  starting  with  “O”  are  observations, 
where  cheap  features  are  labeled  “C”  and  expensive  features  are  labeled  “E”.  After  learning  the 
parameters  of  this  model,  we  can  use  a  standard  inference  algorithm  [4]  to  determine  the  state  of 
each  frame  based  on  their  features.  This  sets  up  the  problem  of  determining  from  which  frames  we 
should  extract  expensive  features.  The  following  section  gives  a  more  concrete  formulation  of  this 
problem. 

1.2  Problem  Formulation 

The  following  problem  formulation  uses  the  same  formulation  described  in  section  2  in  [3].  First, 
we  want  to  model  the  fact  that  observations  are  informative.  We  consider  a  sequence  of  state  vari¬ 
ables  S  —  Xi, ...,  Xn  in  the  HMM,  and  we  define  a  class  of  local  reward  functions  R  on  the  marginal 
probability  distributions  of  the  variables.  The  local  reward  can  be  evaluated  using  probabilistic 
inference  techniques,  and  the  total  reward  will  then  be  the  sum  of  all  local  rewards.  To  define  a 
local  reward,  we  use  a  functional  on  the  max-marginal  Pmax(Xj\0)  for  classification  purpose, 

Rj(Xj\0)  =Rj{Pmax(Xj\0))  =  £P(o)[P"MX(.r*|o)  - Pmax{xj\o)\, 

O 

where  O  is  the  set  of  observed  state  variables,  o  is  their  value,  x}  is  the  value  of  Xj  x*  —  argmaxXjPmcix  (xj\o) , 
and  x  —  argmaxXj^x*  Pmax (xj  \ o) .  Second,  we  want  to  capture  the  constraint  that  observations  are 
expensive.  This  can  mean  that  each  observation  Xj  has  an  associated  positive  penalty  C7  that  effec¬ 
tively  decreases  the  reward.  Third,  it  is  also  possible  to  define  a  budget  B  for  selecting  observations, 
where  each  one  is  associated  with  an  integer  cost  /3 /.  Finally,  our  formulation  of  the  optimization 
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problem  is 
Maximize 

J(0)  =  r(o)  -  C(0 )  =  Y.Hxi\0)  -  Eg 

j  i 

Subject  to 

Eft<s 

i 

where  j  is  the  index  over  all  the  state  variables  S,  i  is  the  index  over  observed  state  variables  O, 
and  B  is  the  total  amount  of  budget  for  the  subsequence. 

In  addition,  the  property  that  our  the  state  variables  in  our  HMM  form  a  Markov  chain  sim¬ 
plifies  this  local  reward.  By  conditional  independence  properties  in  a  graphical  model,  the  local 
reward  is  simply  R(Xj\0)  —  R(Xj\Xj)  in  the  case  that  Xj  e  O.  In  the  case  that  Xj  is  not  in  O.  we 
have  R(Xj\0 )  =  R(Xj\Oj),  where  O 7  is  the  subset  of  O  containing  the  closest  ancestor  and  descen¬ 
dant  of  Xj  in  O.  This  local  reward  simplification  plays  a  key  role  in  the  optimization  algorithm  in 
section  1.3. 

1.3  Conditional  Plan 

To  get  the  set  of  observations  for  the  problem,  we  need  to  specify  a  query  policy.  We  consider 
the  following  conditional  policy:  we  sequentially  observe  the  state  variable  in  the  HMM,  pay  the 
penalty,  and  depending  on  the  observed  values,  select  the  next  query  as  long  as  our  budget  suffices. 
Putting  this  policy  into  the  optimization  problem  above,  our  goal  is  to  find  a  plan  with  the  highest 
expected  reward,  where,  for  each  possible  sequence  of  observations,  the  budget  B  is  not  exceeded. 
We  call  such  a  observation  plan  the  conditional  plan  [3].  To  solve  this  problem,  the  objective 
function  J  to  be  maximized  is  defined  recursively: 

The  base  case  is  defined  on  budget  0: 

J(0  —  o;0)  =  £  Rj{Xj\0  =  o)-C(0) 

Xjes 

The  recursive  case  is  defined  on  budget  limited  to  k: 

J{0  =  o',k)  —  max{7(0  =  o;0),  max  {Y\P(Xj  =y\0  =  o)J(0  =  o,Xj  =  y;k  — 

XinotinO 

j  y 

The  optimal  plan  has  reward  7(0;B).  [3]  provides  a  dynamic  programming  algorithm  on  a  sub¬ 
chain  of  state  variables  Xa,  ...,Xb.  The  base  case  is  defined  on  budget  0: 

b- 1 

Jcv.biXchXbity  Rj(Xj\Xa  XaiXb  Xfjj . 

j=a+ 1 

where,  Ja:b(xa,Xb\ 0)  means  the  optimal  plan  reward  given  Xa  =  xa  and  Xb  =  xb  with  total  budget 
0.  The  recursive  case  defines  Ja:b(xa,xb,k),  in  which  the  total  budget  is  k: 

Ja:b(xa,xb;k )  =  max{Ja:h(xaixb;0) ,  max  {-Cj  +  Lp(xi  =  X j  | Xa  —  Xa-,Xb  — 
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where  it  iterates  through  the  possible  split  points  j,  such  that  a  <j<b,  and  consider  all  possible 
budget  allocation  /  and  k  —  l  —  /3y  for  sub-chains  to  find  maximum  reward.  Using  the  property  that 
reward  is  decomposable  along  chain,  the  optimal  reward  is  obtained  by  The 

state  variables  Xo  and  Xn+\  referred  in  Jq:„+\  (0;5)  are  two  dummy  variables  with  xq  =  1  .xn+\  —  1, 
and  R0  —  Co  —  f5o  —  Rn+  \  —  Cn+ 1  =  /3„+i  =  0.  After  this,  the  observation  plan  can  be  obtained 
by  backtracking  observation  at  each  step  of  the  resulting  conditional  plan.  For  details  of  this 
algorithm,  please  refer  to  section  5  in  [3]. 

1.4  Subsections  and  Convexity  of  Reward 

1.4.1  Subsections 

Using  the  above  dynamic  programming  algorithm,  an  optimal  conditional  plan  can  be  obtained. 
However,  one  issue  with  this  approach  is  that  the  running  time  of  this  algorithm  is  d3B2(l/6n 3  + 
0(n2)),  where  n  is  the  number  of  state  variables  in  HMM  and  d  is  the  maximum  domain  size  of  the 
state  variables  X\  ,....Xn  (d  =  2  in  our  case).  This  algorithm  is  therefore  only  suitable  for  situations 
in  which  n  is  a  fairly  small  value,  such  as  n  =  50  or  100.  In  the  case  of  video  sequences,  n  may  be 
100,000  or  greater,  requiring  an  algorithm  that  is  approximately  linear  in  n. 

To  solve  this  problem,  we  consider  the  Markov  properties  of  our  HMM,  which  says  that  the 
value  of  a  state  variable  only  depends  on  the  values  of  its  nearby  state  variables.  So  to  predict  the 
state  correctly,  we  may  only  need  some  expensive  features  in  some  nearby  positions  rather  from 
the  whole  sequence.  Inspired  by  this,  we  divide  the  whole  sequence  into  subsections,  and  run  the 
dynamic  programming  algorithm  in  each  subsection  to  select  expensive  features  in  each  subsection. 
Given  subsequences  of  constant  length,  this  can  be  done  in  0(n )  time.  The  key  question  is  then  to 
allocate  the  available  budget  for  processing  between  all  of  these  subsequences. 

1.4.2  Convexity  of  Reward 

To  run  the  dynamic  programming  algorithm,  we  may  need  a  budget  for  each  subsection.  A  simple 
scheme  is  to  uniformly  allocate  budget  to  each  subsection  using  the  total  budget.  However,  some 
sections  may  need  more  expensive  features  to  eliminate  false  positives  or  false  alarms.  At  the  same 
time,  many  other  subsequences  will  require  few,  if  any,  additional  expensive  features  because  they 
are  unlikely  to  contain  any  events  of  interest. 

We  are  therefore  left  with  the  following  problem.  For  each  subsequence,  we  can  determine  the 
optimal  feature  acquisition  plan  for  every  possible  budget,  and  the  expected  reward  of  each  budget. 
How  do  we  allocate  a  single  budget  among  all  these  subsequences  to  maximize  our  expected 
reward?  This  problem  has  a  simple  solution  for  the  special  case  in  which  the  expected  reward 
from  acquiring  expensive  features  is  monotonically  increasing  and  convex  with  the  number  of 
features  acquired.  That  is,  the  reward  must  follow  a  law  of  diminishing  returns  in  which  increasing 
the  budget  increases  the  reward  more  and  more  slowly.  In  this  case,  we  can  allocate  our  budget 


Figure  3:  A  convex  curve  (left)  and  a  non-convex  curve  (right) 


optimally  by  assigning  available  resources  incrementally  to  whichever  subsequence  will  benefit 
most  from  them.  We  stress  that  this  incremental  approach  assigns  the  budget  optimally;  within 
each  subsequence  the  dynamic  programming  algorithm  is  used  to  then  determine  the  optimal  set 
of  observations  for  the  assigned  budget. 

Empirically,  we  do  find  that  the  rewards  of  feature  acquisition  are  convex  for  our  problem.  To 
test  this  we  obtained  a  sequence  of  64, 000  consecutive  frames  from  a  video  viewing  the  bike  track 
around  a  lake.  We  divided  the  sequence  into  subsections,  each  of  which  contains  fifty  consecutive 
frames.  A  slightly  modified  version  of  the  dynamic  programming  algorithm  (please  refer  to  the 
experiment  section  below)  with  budget  from  0  to  9  was  run  on  each  subsection.  Then  we  plot  a 
curve  for  each  subsection  describing  the  changes  of  optimal  plan  reward  as  budget  changes,  and 
we  call  it  reward-budget  (RB)  curve.  We  discover  that  the  RB  curve  is  always  either  strictly  convex 
(see  the  left  side  in  Figure  3)  or  non-convex  by  a  tiny  amount  (see  the  right  side  of  Figure  3). 

We  can  therefore  use  the  following  method  to  maximize  the  sum  of  reward  increments  for  each 
subsection.  Fet  N  be  the  maximum  budget  in  the  RB  curves  for  each  subsection.  We  can  compute 
the  reward  increments  in  each  subsection  for  budget  increments  from  zero  to  one,  from  one  to 
two,  ...,  and  from  N  —  1  to  A.  We  sort  all  these  increments,  and  the  budget  for  each  subsection 
is  determined  by  the  number  of  increments  it  has  in  the  sorted  top  B  increment  list.  We  call  this 
method  batch  allocation  of  budget. 

We  can  further  improve  on  this,  and  avoid  wasted  computation  with  the  following  algorithm: 

1.  Initialize  the  budget  of  each  subsection  to  be  zero; 

2.  Compute  the  increment  of  reward  for  each  subsection  for  increasing  the  budget  from  zero  to 
one; 

3.  Select  the  highest  increment  of  reward,  and  add  one  more  budget  to  its  corresponding  sub¬ 
section  Q; 
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4.  If  the  total  budget  for  the  whole  sequence  has  been  used  up,  terminate  and  use  the  current 
subsection  budget  allocation  as  the  final  subsection  budget  allocation; 

5.  Compute  the  reward  increment  for  subsection  Q  when  its  current  budget  is  incremented  by 
one,  and  use  it  to  replace  the  current  reward  increment  for  this  subsection.  Go  back  to  step 

3; 


We  call  this  algorithm  dynamic  allocation  of  budget.  We  give  a  proof  of  this  algorithm  below  to 
show  that  it  gives  an  optimal  budget  allocation  to  maximize  the  increment  of  reward. 

Problem:  Given  a  total  budget  B,  and  let  A R'j,  where  i.  j  £  Z+,  be  the  increment  of  reward  of 
optimal  plan  for  subsection  j  when  budget  going  from  z  —  1  to  i.  Then  by  the  convexity  assumption, 
we  know  that  A  R'j  >  AR1-  1  for  every  possible  i  and  j.  Let  av  be  the  budget  for  subsection  j,  and 
RjJ  be  the  total  increment  of  reward  for  subsection  j  with  budget  ar  Then  we  have 


0  if  (Xj  =  0 

l£iA*5  if  >  0 


show  that  the  dynamic  allocation  of  budget  algorithm  minimizes  YJJ  R°jf  where  N  is  the  total 
number  of  subsections. 

Proof:  First,  we  observe  that  if  AR1,’,  >  A Rfn,  then  AR1,’  will  be  selected  before  A Rfn  is  selected. 
To  show  this  is  true,  suppose  ARqm  is  selected  first,  then  it  must  be  greater  than  or  equal  to  every 
reward  increment  for  every  subsection  at  that  particular  moment,  including  the  one  for  subsection  n. 
Let  the  reward  increment  be  AR™.  Since  AR1,1,  >  A Rfn,  then  AR1,',  >  A R™.  By  convexity  assumption, 
we  know  that  p  <  w.  Then  it  must  be  that  AR„  is  selected  before  AR™,  and  as  a  result  before 
A Rfn.  But  this  creates  a  contradiction  with  our  original  assumption.  This  observation  is  denoted  as 
lemma  1. 

Second,  let  the  the  optimal  budget  allocation  be  the  set  {j3/},  where  /3V  be  the  budget  for 
subsection  j.  And  let  {/}  be  the  set  of  indices  for  those  subsections  whose  budget  is  zero.  Then 
by  the  definition  of  Rjf  we  have  Yfj  R^j  —  L/LjtLi  AR^.  Sorting  A Rf  for  all  possible  l  and  k,  we 

can  obtain  Y*j  R^j  =  Yff,=  \  ATb,  where  each  ATb  is  corresponding  to  a  A Rkt,  and  ATb  >  ATb-\  for 
every  possible  b.  Also,  let  AR/,  be  the  blh  reward  increment  selected  in  the  algorithm,  and  jj  be  the 
budget  for  subsection  j  computed  by  the  algorithm.  Then  we  can  see  that  the  total  increment  of 
reward  by  the  algorithm  Yfj  Rj  —  Yf,=  \  A Pb.  The  observation  that  ATb  <  A Pb  for  every  possible  b 
is  shown  below.  This  observation  is  denoted  as  lemma  2. 

Suppose  that  AT/,  >  A Pb  for  some  b.  First,  we  know  that  ATb  is  equal  to  some  reward  increment 
A Rn  and  the  same  fact  holds  for  AP/,.  Also,  A Tc  >  ATb  for  each  possible  c,  such  that  c  <  Ik  so 
A Tc  >  APb  for  each  c.  Then  by  lemma  1,  we  know  that  A Tc  for  each  possible  c  and  ATb  will  be 
selected  before  APb  in  the  algorithm,  and  totally,  there  are  b  of  them.  However,  there  are  only  b  —  1 
increment  of  reward  is  selected  before  APb  is  selected  by  the  definition  of  AP/,.  So  this  creates  a 
contradiction. 

Finally,  by  lemma  2,  we  know  that  Yfj  RYf  =  Lf=i  APb  >  Lf=i  ATb  =  Yfj  R^f  ■  Also  since  {/3y} 
is  the  set  of  optimal  budget  allocation,  we  have  Yj  RjJ  <  R^j1  ■  As  a  result,  Yj Rj1  =  Y Nj  R^J  ■ 
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The  proof  is  completed. 


1.5  Experiments 

Currently,  only  a  simple  experiment  is  performed  on  the  64000  frames  mentioned  in  the  previous 
section.  The  real  state  of  each  frame  is  labeled  by  hand.  The  features  we  use  are  the  size  of  the 
largest  connected  foreground  component  from  two  background  subtraction  algorithms:  Frame  Dif¬ 
ference  (FD)  [2]  and  Adaptive  Gaussian  Mixture  Model  (AGMM)  [7].  After  the  background  sub¬ 
traction,  some  morphological  operations  are  performed  before  computing  the  size.  Since  AGMM 
is  more  accurate  than  FD  in  detecting  the  foreground  region  and  more  time-consuming,  we  con¬ 
sider  features  from  AGMM  as  expensive  features  and  features  from  FD  as  cheap  features.  To  do 
inference  on  HMM,  we  extract  the  cheap  feature  at  every  frame  and  use  the  dynamic  programming 
method  with  subsection  to  determine  on  which  frames  we  should  sample  expensive  features.  The 
total  64000  frames  are  divided  into  subsections,  each  of  which  consists  of  50  consecutive  frames. 
In  [3],  it  mentions  using  dummy  state  variables  with  known  value  at  the  beginning  and  end  of  the 
sequence  before  running  the  dynamic  programming  algorithm.  However,  we  directly  use  the  real 
state  of  the  first  and  last  frame  in  each  subsection  to  run  the  algorithm.  The  local  reward  Rj  is  com¬ 
puted  based  on  the  marginal  probability  produced  by  cheap  features,  penalty  Cj  associated  with  a 
expensive  feature  observation  is  always  zero,  and  the  observation  cost  for  an  expensive  feature  /37 
is  always  one.  The  budget  for  each  subsection  determined  using  batch  allocation  method,  and  the 
RD  curves  for  each  subsection  are  computed  up  to  budget  9.  To  make  the  learned  parameters  of  the 
HMM  confirm  with  distribution  in  the  testing  data,  we  do  testing  and  training  based  on  the  same 
data.  The  inference  error  rates  purely  using  expensive  or  cheap  features  are  shown  in  Table  1.  The 
error  rates  for  our  sampling  method  and  uniform  sampling  method  under  different  total  budgets  for 
the  whole  sequence  are  shown  in  Table  2. 

Table  1:  error  rates  purely  using  one  kind  of  features 


Cheap  Features 

Expensive  Features 

0.0066 

0.0018 

Table  2:  error  rates  under  different  budgets  for  expensive  features 


Budget 

Our  Sampling 

Uniform  Sampling 

1850 

0.004 

0.0059 

910 

0.004 

0.0064 

770 

0.004 

0.0065 

570 

0.0042 

0.0065 

From  these  tables,  we  can  see  that  by  using  a  few  expensive  feature  samples,  our  sampling 
algorithm  can  achieve  better  inference  accuracy  than  uniform  sampling.  Also,  compared  with  the 
error  rate  of  purely  using  expensive  feature,  the  error  rate  is  still  a  bit  high  using  the  budget  we 
have  tried  so  far. 
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1.6  Summary 

The  main  purpose  of  this  simple  experiment  has  been  to  make  our  proposed  work  more  concrete. 
Even  for  a  simple  motion-detection  task,  we  see  there  is  a  significant  trade-off  available  between 
accuracy  and  the  amount  of  processing  we  perform.  This  trade-off  becomes  continuous  if  we  pro¬ 
cess  all  frames  cheaply  and  some  frames  more  accurately;  by  combining  all  the  results  using  a 
graphical  model,  expensive  processing  in  some  frames  helps  us  to  better  analyze  all  the  frames. 
Given  a  fixed  budget  of  accurate  processing,  making  the  most  of  this  budget  is  a  problem  of  label 
acquisition.  Performing  effective  label  acquisition  in  a  long  video  sequence  is  still  an  open  prob¬ 
lem,  even  when  we  use  a  simple  HMM  as  our  graphical  model.  However,  we  have  made  significant 
progress  on  this  problem.  Our  proposed  work  aims  to  extend  this  simple  example  to  more  complex 
vision  tasks  and  richer  graphical  models. 

2  Extension  to  More  Complex  Models 

The  method  we  described  in  the  previous  section  is  applicable  to  HMMs,  which  is  quite  reasonable 
for  a  single  video  sequence.  However,  when  we  have  a  network  of  cameras,  we  need  to  integrate 
information  from  multiple  sources  and  we  need  to  make  simultaneous  decisions  about  whether 
there  was  an  object  in  any  of  the  cameras  and  if  so,  in  which  ones.  In  such  situations,  we  probably 
need  more  complex  graphical  models,  such  as  arbitrary  Markov  Random  Fields.  When  the  under¬ 
lying  graphical  model  structure  is  irregular,  the  algorithms  we  described  in  the  previous  section  are 
not  applicable  anymore.  To  address  this  issue,  we  have  developed  a  new  algorithm,  called  Reflect 
and  Correct  (RAC)  [1]. 

RAC  is  an  iterative  algorithm  that  first  finds  the  locations  where  an  incorrect  decision  is  made 
and  then  acquires  more  information  in  those  locations.  The  key  element  of  RAC  is  the  question  of 
how  to  figure  out  if  a  frame  is  misclassified.  We  answer  this  question  by  using  a  local  classifier 
that  makes  independent  decisions  for  each  frame  and  by  comparing  its  label  estimates  with  the 
estimates  of  the  graphical  model.  We  construct  some  features  using  the  comparison  of  the  estimates 
and  fit  a  classifier  that  can  predict  if  a  frame  is  misclassified. 

Our  preliminary  experiments  with  synthetic  datasets  and  publication  datasets  are  very  encour¬ 
aging.  RAC  significantly  outperformed  (in  terms  of  accuracy  of  the  labeling)  a  viral  marketing 
based  strategy  [6]  and  previous  approaches  that  are  based  on  network  structural  properties  such  as 
network  clustering  and  node  degree  [5]. 
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